04. Perturbations on Columns

Perturbations on Columns

As we previously discussed, one way to generate different trees is to subsample the set of columns the trees use to generate splits. When random subsets of the dataset are drawn as random subsets of features, then the method is known as the Random Subspace Method .

MLND SL DT 13 Random Forests MAIN V2

Importance of Random Column Selection

Sometimes one feature will dominate in finance. If you don’t apply some type of random feature selection, then your trees will not be that different (i.e., will be correlated) and that reduces the benefit of ensembling.

What features are typically dominant? Well, we'll talk about this more later when we talk about feature engineering, but when we use random forests for alpha combination, some of our features are alpha factors. Classical, price-driven factors, like mean reversion or momentum factors, often dominate. You may also see that features that define industry sectors or market "regimes" (periods defined, for example, by high or low market volatility or other market-wide trends) are towards the root of the tree.